J 



Europilsches Patantamt 
European Patent Office 
Office europten des brevets 




@ Publication number: 0 644 695 A2 



g) Application number : M306748.8 
g) Date of filing: 14.00.94 



EUROPEAN PATENT APPLICATION 

@lnt CI.': H04N7/24 



@ Priority: 21.09.93 US 124917 

@ Data of publication of application : 
22.03.95 Bulletin 95/12 

@ Designated Contracting States : 
DE FR GB IT NL 

@ Applicant : AT & T Corp. 
32 Avenue of the Americas 
New York, NY 10013-2412 (US) 

@ Applicant : BELL COMMUNICATIONS 
RESEARCH, INC. 
290 West Mt Pleasant Avenue 
Livingston, New Jersey 07039-2729 (US) 



@ inventor : Purl, AtuI 
3660 Waldo Avenue 1A, 
RIverdale 

New York 10463 (US) 
Inventor : Hafong Wong, Andria 
5B Eaton Crest Drive 
Eatontown, New Jersey 07724 (US) 

@ Representative : Buckley, Christopher Simon 
Thirsk et al 
AT&T (UK) LTD., 

AT&T Intellectual Property DMston, 

5 Mornlngton Road 

Woodford Green, Essex IG8 OTU (GB) 



s 



0. 



@ Spatially scalable video encoding and decoding. 

@ Video Images of varying resolutions are de- 
rived from one video signal (VIDIN) with high 
bandwidth efftdency by employing a new two- 
layer video coding technique using spatial 
scalability in which the prediction taken from 
one layer (1140) Is combined with the prediction 
taken from the other layer (1180), and the com- 
bined predictton is used to code one of the 
layers (in 1180). In an Illustrative example of the 
invention employing a base-layer and an enhan- 
cement-layer, the spatially Interpolated base- 
layer (on 1170) is combined, by the selection of 
appropriate weights, with the motion compen- 
sated temporal prediction of the enhancement 
layer to generate the prediction used to encode 
the enhancement-layer (in 1180). Weights are 
selected based on a calculation of the sum of 
the absolute differences, or the sum of the 
squares of the differences between the predic- 
tion and reference macrobiocks to produce the 
prediction giving the best bandwidth efficiency. 
This weighting process is called spatio-tem- 
poral weighting. 
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Technical Field 

This invention relates to encoding and decoding 
of video signals and, more particularly, to efficient en- 
coding of video signals in a scalable manner which 
permits video i mages to be decoded in a variety of re- 
solution scales and picture formats. 

Background 

Worldwide efforts are underway to improve the 
quality of video signal production, transmission, and 
reproduction because a great deal of commercial im- 
portance is being predicted for improved quality video 
systems. These efforts involve, at least in part, in- 
creasing the resolution with which inrtages are con- 
verted into representative electrical signals, typically 
in the fomfi of digital bit-streams, by increasing the 
spatial and temporal sampling rates that are used to 
convert video images into electrical signals. This in- 
crease in resolution consequently means that more 
data about images must be produced, processed, and 
transmitted in a given time interval. 

Video images, such as those images in the field 
of a television camera, are scanned at a predeter- 
mined rate and converted into a series of electrical 
signals, each electrical signal representing a charac- 
teristic of a predetermined region of the image gener- 
ally known as a picture element, pel, or pixel. Picture 
elements are typically grouped into macroblocks for 
most video signal processing purposes where each 
macroblock consists of a 16 by 1 6 array of picture ele- 
ments. A plurality of macroblocks taken together at a 
predetermined instant in time forms what amounts to 
a still picture (i.e., a frame) representing the nature 
of the image at the predetermined Instant of time. In- 
creasing the quality of video signals produced in this 
manner involves, at least in part, the use of larger 
number of smaiier'Size picture elements to represent 
a given image frame and the production of a larger 
number of image frames per unit time. 

As the number of pels for each video image in- 
creases and the rate at which images are produced 
increases, there is an increasing amount of video data 
which must produced, transmitted, received and 
processed in a given time interval. A number of video 
compression schemes have been proposed which at- 
tempt to transmit higher quality video images using 
the same number of bits and the same bit rates used 
for lower quality images. The Motion Pictures Expert 
Group Phase 1 (MPEG-1) standard provides a partic- 
ular syntax and decoding process for one such 
scheme. This standard is set forth in International 
Standards Organizatton (ISO) Committee Draft 
11172-2. 'Coding of Moving Ptetures and Associated 
Audio for Digital Storage Media at up to 1.5 Mblts/s," 
November, 1991. 

It may be desirable to obtain one or more lower 



resolution images from a single transmitted high- 
resolution video signal. For example, a video signal 
simultaneously transmitted to both hlgh-definitk>n tel- 
evision (HDTV) and standard NTSC television receiv- 
5 ere may have to provide images having a very high 
degree of resolution to the HDTV receivers and im- 
ages having a lesser degree of resolution to the stan- 
dard receh^ere. Similarly, the degree of image reso- 
lutton which needs to be obtained from a video signal 
10 displayed on a windowed computer screen must by 
varied with the size of the particular window in which 
it is displayed. Other applications in which multiple-re- 
solution images are desirable include vkJeo confer- 
encing where different video equipment may be em- 
15 ployed at each location and video transmitted over 
asynchronous transfer mode (ATM) networks. 

One known method of provkiing a video signal 
from which images of varying resolution may be de- 
rived is to simultaneously transmit a set of indepen- 
20 dent repilcas of a video sequence, each replica being 
scaled for reproduction at a different level of resolu- 
tion. This approach, known as "simulcasting," is sim- 
ple, but it requires increased bandwidth to accommo- 
date the transmission of multiple Independent video 
25 images. A more bandwidth efficient alternative to 
simulcasting is scalable video. Resolution scalable 
video is a technique in which a video signal is coded 
and the resulting bit-sequence is partitioned eo that 
a range of resolution levels may by derived from it de- 
30 pending upon the particular signal decoding scheme 
employed at the receiver. 

Resolutnn scalable vkieo coding may be ach- 
ieved either in the spatial or frequency domain. Spa- 
tial scalability uses layered coding, typically including 
35 a base-layer and an enhancement-layer In the spatial 
domain, where there is a loose coupling between the 
layers, that is, the coding algorithms used to code the 
layers are independent, but the enhancement layer is 
coded using the reconstructed images produced by 
40 the base layer. The coding scheme used for the two 
layers can also be chosen Independently, as can the 
particular methods of up and down sampling. 

Unfortunately, the coding of resolution scalable 
video is not provided within the constraints of most 
45 video standards. A particular limitation of the MPEG- 
1 coding standard is its lack of provisk>ns facilitating 
resolution scalable video encoding and decoding. 



so 



Summary 



Video images of varying resolutions are derived 
from one video signal with high bandwidth efficiency, 
in accordance with the principles of the inventton, by 
employing a new two-layer video coding technique 
55 using spatial scalability in which the prediction taken 
from one layer is combined with the prediction taken 
from the other layer, and the combined prediction is 
used to code one of the layera. 
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In an illustrative example of the invention empby- 
ing a t^se-layer and an enhancement-layer, the spa- 
tially Interpolated base-layer Is combined, by the se- 
lection of appropriate weights, with the motion com- 
pensated temporal prediction of the enhancement- 5 
layer to generate the prediction used to encode the 
enhancement-layer. Weights are selected based on a 
calculation of the sum of the absolute differences, or 
the sum of the squares of the differences between 
the prediction and reference macroblocks to produce io 
the prediction giving the best bandwidth efficiency. 
This weighting process is called spatio-temporal 
weighting. 

Other aspects of Illustrative examples of the in- 
vention include using spatiotemporal weighting 15 
where the base-layer and enhancement-layer have 
particular picture formats that may be requirsd in cer- 
tain applications of the Invention. For example, in one 
application the base-layer may need to be in an inter- 
laced format, while in another a progressive fonfnat is 20 
required. Thus, there are four illustrative forms of spa- 
tial scalability employing aspects of the invention that 
result from the various combinations of base-to- 
enhancement-iayer picture formats. These are: pro- 
gressh^e-to-progressive, progressh^e-to-interlace, In- 25 
teriace-to-progressive, and interlace-to -Interlace. 

The invention provides substantial improve- 
ments over prior art techniques of resolution scalable 
video. For example, bandwidth efficiency is in- 
creased; an optimized set of weights can be selected 30 
for each form of spatial scalability; the layers can be 
coded to provide for compatibility between a number 
of different coding standards; and the layers may be 
readily prioritized for transmission on networks using 
multiple priorities for more robust and error-resilient 35 
transmission. 

The discussion in this Summary and the following 
Brief Description of the Drawing, Detailed Descrip- 
tion, and drawings merely represents examples of 
this invention and is not to be considered In any way 40 
a limitation on the scope of the exclusionary rights 
conferred by a patent which may issue from this ap- 
plication. The scope of such exclusionary rights is set 
forth in the claims at the end of this application. 

45 

Brief Description of the Drawings 

FIG. 1 shows, in simplified block diagram form, 
an illustrative two-layer encoder and decoder em- 
bodying aspects of the invention. so 

FIG. 2 shows a block diagram of the decimation 
operation (DEC) used in the progressive-to-progres- 
slve and progressive-to-interlace fomts of spatial 
scalability. 

FIG. 3 shows a block diagram of the Interpolation ss 
operation (INTP) used in the progressive-to-progres- 
sfve and progressive-to-interlace forms of spatial 
scalability. 



FIG. 4 shows a block diagram of the decimation 
operation (DEC) used in the interlace-to-progressive 
form of spatial scalability. 

FIG. 5 shows a block diagram of the interpolation 
operation (INTP) used in the interlace-to-progressive 
fomn of spatial scalability. 

FIG. 6 shows a block diagram of the decimation 
operatton (DEC) used in the interlace-to-interlace 
form of spatial scalability. 

FIG. 7 shows a block diagram of the interpolation 
operation (INTP) used in the Interlace-to-interlace 
form of spatial scalability. 

FIG. 8 shows the details of the Interlace-to-pro- 
gresslve Interpolation operation used In Interlace-to- 
interlace form of spatial scalability. 

FIG. 9 shows the details of the interlace-to-pro- 
gressive interpolation operation used in the interlace- 
to-progressive fomn of spatial scalability. 

FIG. 10 shows the principles behind the weighted 
spatio-temporal prediction for the progressive-to-pro- 
grsssive and interlace-to-progressive forms of spatial 
scalability in accordance with an aspect of the inven- 
tion. 

FIG. 11 shows the principles behind the weighted 
spatio-temporal predkrtion for the progressive-to-in- 
terlace and interlace-to-interlace forms of spatial 
scalability in accordance with an aspect of the inven- 
tion. 

FIG. 12 shows a diagram of the base-layer en- 
coder and an enhancement-layer encoder used In the 
illustrative embodiment of FIG. 1 . 

FIG. 13 shows a two-layer decoder correspond- 
ing to illustrative embodiment of FIG. 1. 

FIG. 14 shows a simplified block diagram of a 
spatio-temporal weighter embodying aspects of the 
invention. 

FIG. 15 shows a simplified block diagram of a 
spatio-temporal analyzer embodying aspects of the 
invention. 

The following abbreviations have been used in 
the drawings listed above: 
BF-buffer 

COMP - comparator 

DEC - decimation 

INTP - interpolation 

MC - motion compensation 

ME - motion estimation 

mv - motion vectors 

ORG- organizer 

Q - quantizer 

10 - inverse quantizer 

OA- quantizer adapter 

T- transform (e.g., a Discrete Cosine Trans- 
form [OCT]) 

IT- Inverse transform 
VFE - variable and fixed length encoder 
VFD - variable and fixed length decoder 
STA - spatio-temporal analyzer 
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PS - previous picture store 
NS - next picture store 
SW- switch 
WT - weighter 

5 

Detailed Peacriptton 

FIG. 1 shows, in simplified block diagram form, 
an illustrative two-layer encoder and decoder, enrv 
bodying aspects of the invention, including enhance- io 
ment-iayer encoder 1180, base-layer encoder 1140, 
enhancement-layer decoder 1340, base-layer decod- 
er 1300, and other elements. Enhancement-layer en- 
coder 1180, base-layer encoder 1140, enhancement- 
layer decoder 1340, and base-layer decoder 1300, 15 
and the functions contained therein are described In 
detail below. High resolution video signal VIDIN en- 
ters on input line 1100 and passes to spatial decima- 
tor 1120 on line 1110, where high resolution video sig- 
nal VIDIN may be low-pass filtered before spatial dec- 20 
Imator 1120 reduces the number of picture elements 
to a lower resolution called the base-layer resolution. 
High resolution video signal VIDIN may be formatted 
as either interlaced or progressive. It will be appreci- 
ated by those skilled in the art that it may be desirable, 25 
in some examples of the invention, to use frame- 
picture coding for interlaced video signals In accor- 
dance with the Motion Picture Experts Group Phase 
2 Test Model 5 Draft Version 2, Doc. MPEG93/225. 
April 1993 (MPEG-2). Alternatively, it may be desir- 30 
able to employ field-picture coding in accordance 
with MPEG-2. The operation of reducing the number 
of picture elements is called decimatton (DEC). Al- 
though decimators are well known in the art, specific 
methods of decimation, in accordance with aspects of 35 
the invention, are discussed In detail below. The deci- 
mated base-layer signal is then output on line 1130 
and passes to base-layer encoder 1140, which out- 
puts encoded bit-stream BL on line 1190. 

Base-layer encoder 1140 also outputs a locally 40 
decoded base-layer video picture on line 1150 to spa- 
tial interpolator 1160. Spatial interpolator 1160 in- 
creases the number of pels per frame using a method 
of upsampling interpolatton (INTP). Although interpo- 
lators are well known in the art, specific methods of 46 
upsampling interpolation, in accordance with aspects 
of the inventbn, are discussed in detail below. The 
upsampled enhancement-layer signal is output on 
line 1170 to enhancement-layer encoder 1180 which 
outputs encoded bit-stream EL on line 1200. En- so 
hancement-layer encoder 1180 utilizes the upsam- 
pled signal from line 1170 as a prediction, in order to 
advantageously increase the efficiency of coding the 
high resolution video signal input on Input line 100. 

Encoded bit-streams BL and EL at the output of 55 
base-layer encoder 11 40 and enhancement-layer en- 
coder 1180 are combined in multiplexer 1250 in prep- 
aratton for transmission on channel 1260. Alternative- 



ly, bit-streams BL and EL could be sent on two sep- 
arate and independent channels. The encoding of 
high-resolution vkieo signal VIDIN into bit-streams BL 
and EL advantageously allows the use of prioritization 
for transmission on networks using multiple priorities 
which facilitates more robust and enror-reailient trena- 
mission. 

If bit-streams BLand EL are multiplexed, then, af- 
tertransmission on channel 1260, demultiplexer 1270 
separates bit-streams BLand EL and outputs bit-stre- 
ams BL and EL on lines 1290 and 1280, assuming 
there are no transmission errors on channel 1260. 

Bit-streanr)s BL and EL are input into base-layer 
decoder 1300 and enhancement-layer decoder 1340 
on lines 1290 and 1280, respectively. Base-layer de- 
coder 1300 outputs a decoded base-layer video sig- 
nal VIDOUTb on line 1310, which, in the absence of 
transmission errors, is exactly the same as the repli- 
ca decoded video signal on line 1150. 

Decoded base-layer video signal VIDOUTb is 
also input on line 1315 to spatial interpolator 1320, 
which is a duplicate of interpolator 1160 and which 
produces an upsampled signal on line 1 330. In the at>- 
sence of transmission errors, the upsampled video 
signal on lines 1330 and 1170 are identical. Enhance- 
ment-layer decoder 1340 utilizes the upsampled vid- 
eo on line 1330 in conjunction with the enhancement 
layer bit-stream on line 1290 to produce a decoded 
higher resolution video signal VIDOUTe on output line 
1350. 

As will be appreciated by those skilled in the art 
that it may be desirable, in certain applications, for 
high resolutton video input signal VIDIN at input line 
1100 to be of progressive format where as other ap- 
pllcattons it may be desirable for VIDIN to be of inter- 
laced format Thus, four forms of spatial scalabilities 
may be used to illustrate the principles of the inven- 
tion that depend on the base-layer-to-enhancement- 
layer picture formats. The illustrath^e forms of spatial 
scalability are called progressive-to-progressive, pro- 
gressive-to-interiace, interlace-to-progressive and 
interlace-to-interlace. Depending on the form of spa- 
tial scalability, the spatial decimation (DEC) and spa- 
tial interpolatton (INTP) operattons discussed above 
may be different. The DEC and INTP operations nec- 
essary for each of the four aforementioned forms of 
spatial-scalability and the potential application of 
each are discussed below. 

FIGS. 2 and 3 show the DEC and interpolation 
INTP operattons required for the illustrative progres- 
sive-to-progressive and progressive-to-interiace 
forms of spatial scalability. As shown in FIG. 2, in the 
DEC operation, high resolution video signal in pro- 
gressive format is Input on line 2110 to horizontal and 
vertical declmator 2115, where high resolutton video 
signal SIGINq may be filtered before horizontal and 
vertical declmator 2115 reduces the number of pic- 
ture elements contained In high resolution video sig- 
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nal SIGINd in both the horizontal and vertical direc- 
tions by reducing the rate at which high resolution vid- 
eo signal SIGINd is sampled. The lower spatial reso- 
lution video SIGOUTd is output on line 2130. In some 
applications it nrtay k>e desirable that no reduction In 5 
sampling rate below 1:1 occur in either the horizontal 
or the vertical directions. The rate at which horizontal 
and vertical sampler 2115 samples high resolution 
video signal SIGINq Is expressed as a ratio between 
two integere, for instance 2:1. Thus horizontal deci- io 
mation requires reduction of the sampling rate by a 
ftictor of 2 horizontally. 

As shown in FIG. 3, in the INTP operation, low re- 
solution video signal SIGIN) In progressive fbnmat en- 
tera on line 31 50 to horizontal and vertical Interpolator 15 
3155 which performs the inveree operation of that 
performed by horizontal and vertical deci mator 2115 
(FIG. 2). However, since a loss of spatial resolution in 
declmator horizontal and vertical declmator 2115 oc- 
cura, horizontal and vertical interpolator 3155 can 20 
only output an approximation of the signal at the input 
to declmator 2110 (FIG. 2) as SIGOUT, on line 3170. 

An example of the progresslve-to-progressive 
form of spatial scalability is a coding scheme where 
a Common Intermediate Format (CIF) type signal Is 25 
input to base-layer encoder 1140 (FIG. 1) at a resolu- 
tion of 352 horizontal x 288 vertical picture elements 
per progressive frame, and a Super Common Inter- 
mediate Format (SCIF) type signal Is input to en- 
hancement-layer encoder 1180 (FIG. 1) at a resolu- so 
tion of 704 horizontal x 576 vertical picture elements 
per progressive frames, where both input signals hav- 
ing a frame rate of 30 frames/sec. CIF and SCIF for- 
matted signals are well known In the art. The CIF sig- 
nal input to base-layer encoder 1140 (FIG. 1) is de- 35 
rived from the SCIF signal by spatial decimation us- 
ing a factor of 2 In both the horizontal and in vertical 
directions. Locally decoded base-layer frames are in- 
terpolated by a factor of 2 both horizontally and vert- 
ically and are used In the prediction used to encode 40 
the signal In the enhancement-layer. Although this ex- 
ample requires a factor of 2 for horizontal and vertical 
decimation and interpolation, other integer ratios can 
also be advantageously used, as will be appreciated 
by those skilled in the art. In this example, a Motion 45 
Pictures Expert Group Phase 1 (MPEG-I) or CCITT 
Recommendatbn H.261 -Video Codec for Audiovisual 
Services at px64 Knit/s, Geneva, August, 1990 
(H.261) coding scheme is used to code the base-layer 
to Illustrate how spatial scalability. In accordance with so 
an aspect of the Invention, can advantageously per- 
mit compatibility between the MPEG-1 , or H.261 , and 
MPEG-2 standards. It is also possible, and may be 
desirable in some applications to use MPEG-2 coding 
In both layera. For example, the base-layer may em- 55 
ploy MPEG-2 main-profile coding and the enhance- 
ment-layer may employ MPEG-2 next-profile spatial- 
ly scalable coding. Both main-profile and next-profile 



coding schemes are known in the art. 

An example of progresslve-to-interlace scalabili- 
ty Is a coding scheme where a Source Input Format 
(SIF) type signal Is input to base-layer encoder 1140 
(FIG. 1) at a resolution of 352 horizontal x 240 vertical 
pictures elements per noninterlaced frame, and a 
Comite Consultatif Internatlonal des Radtocommuni- 
catlons Recommendatton 601, Standard 4:2:0 
(CCIR-601 4:2:0) type signal is input to enhance- 
ment-layer encoder 1180 (FIG. 1) at a resolution of 
704 horizontal x 480 vertical picture elements per in- 
terlaced frame. The SIF signal input to base-layer en- 
coder 1140 (FIG. 1) Is derived from the CCIR-601 
4:2:0 signal by dropping the even-fields, followed by 
spatial decimation by factor of 2 In horizontal direc- 
tion. Locally decoded base-layer frames are upsanv 
pled by a factor of 2 horizontally and vertically and 
used for the prediction used to encode the enhance- 
ment-layer. In this example, an MPEG-1 coding 
scheme is used to encode the base-layer, as in the 
above example, to illustrate how spatial scalability, In 
accordance with an aspect of the Inventton, can ad- 
vantageously permit compatibility between the 
MPEG-1 and MPEG-2 standards. It Is also possible, 
and may be desirable In some applications to use 
MPEG-2 coding In both layere. For example, the 
base-iayer may employ MPEG-2 main-profile coding 
and the enhancement-layer may employ MPEG-2 
next-profile spatially scalable coding. 

FIGS. 4 and 5 show the DEC and INTP opera- 
tions required for the Illustrative interlace-to-progres- 
sive form of spatial scalability. As shown In FIG. 4, in 
the DEC operation, high resolution video signal Sl- 
GIN02 Is input on line 4110 to horizontal and vertical 
declmator 4115 which reduces the number of picture 
elements in the horizontal and vertical directions by 
reducing the rate at which the high resolution video 
signal is sampled. A lower spatial resolution progres- 
sive video signal is output on line 4415. In some ap- 
plications, It may be desirable that the sampling rate 
not be reduced below 1:1 In either the horizontal or 
the vertical directions. The sampling rate used by hor- 
izontal and vertical decimator4115 is expressed as a 
ratio between two integere, for Instance 2:1 . Horizon- 
tal declmatk>n requires reduction of sampling rate by 
a factor of 2 horizontally. Next, progressive frames at 
line 4105 further undergo a progressh^e-to-interlace 
decimation operation that is well known In the art. 
Lower spatial resolution Interlaced frames are then 
output on line 4130 as SIG0UTd2. 

As shown in FIG. 5, in the INTP operation, low re- 
solution Interlaced video signal SIGIN12 enters on line 
5150 to interlace-to-progressive interpolator 5165. 
Progressive lower resolution frames on line 5145 are 
output of Interlace-to-progressh^e Interpolator 5165 
and are fed as an Input to horizontal and vertical in- 
terpolator 5155 which performs the inveree operation 
of that performed In horizontal and vertical declmator 
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4115 (FIG. 4). However, since a loss of spatial reso- 
lution in horizontal and vertical decimator 4115 oc- 
curs, horizontal and vertical Interpolator 5155 can 
only output an approximation of the signal input on 
line 411 0 (FIG. 4) to horizontal and vertical deciniator 5 
4115 (FIG. 4) on line 5170 as SIG0UT,2. 

An example of interlace-to-progressive scalabili- 
ty is a scalable coding schenne where the base-layer 
employs the main-profile coding of CCIR-601 4:2:0 
resolution interlaced frames and the enhancement- io 
layer employs progressive high definition television 
(HDTV) resolution frames at 60 frames/sec. CCIR- 
601 4:2:0 main-profile and HDTV formatted signals 
are well known in the art Such interlace-to-progres- 
sive spatial scalability can be advantageously used to i$ 
achieve digital progressive-HDTV compatibility with 
standard digital TV. In this example, MPEG-2 main- 
profile encoding can be employed in the base-layer 
and the enhancement-layer can employ MPEG-2 
next-profile spatially scalable coding. 20 

FIGS. 6 and 7 show the DEC and INTP opera- 
tions of decinnatlon required for the illustrative Inter- 
lace-to-interlace fomri of spatial scalability. As shown 
In FIG. 6, for decimation of interlaced signal to gen- 
erate a lower spatial resolution interlaced signal, the 25 
first step involves an interlace-to-progressive interpo- 
lation operation. Assuming, for purposes of this exanv 
pie, that input signal SIGIN[>3 is 30 frames/sec inter- 
laced, so that It can be alternately viewed as 60 
fields/sec interlaced video where each field contains 30 
half the lines of a frame. Interlace-to-progressive in- 
terpolator 6165 generates progressive frames from 
the 60 field/sec interlaced video signal Input on line 
6610 such that same number of lines as the interlaced 
frame is output at 60 frames/sec on line 61 05. Follow- 35 
ing interlace-to-progressive interpolation, the signal 
is decimated in horizontal and vertical decimation 
6115 which outputs the decimated signal on line 6130 
as SIG0UTd3. If a lower resolution interlaced signal 
with half the horizontal and vertical resolution as the 40 
original input signal is desired, then decimation fac- 
tors of 2:1 horizontally and 4:1 vertically may be env 
ployed. 

FIG. 7 shows the INTP operatton used in the Il- 
lustrative interiace-to-interiace fomn of spatial seal- 45 
ability. Again, for purposes of this example, lower spa- 
tial resolution interlaced signal SIGINia at 30 
frames /sec is viewed as a 60 fields/sec signal. The 
first step is identical to that in decimation operation 
described above in reference to FIG. 6 in that it in- so 
volves iriteriace-to-progressive interpolation opera- 
tion on input signal SIGINia on line 71 50 which results 
in a 60 frames/second progressive video output on 
line 7175. In the next step, the progressive frames 
generated by even-fields are available on line 7195 55 
after passing through switch 7185 and are vertically 
interpolated by 1:2 and resampled by vertical interpo- 
lator and line selector 7235. The progressh^e frames 



generated by odd-fields are available on line 7205 af- 
ter passing through switch 7185 and are not resanv 
pled in this step. Next, switch 7255 alternatively se- 
lects the signals on line 7205 and 7145. tn the final 
step, the output of switch 7255 is fed via line 7145 to 
1:2 horizontal interpolator 7155. 

It will be helpful, at this point, to describe the in- 
terlace-to-progresslve interpolation operatton em- 
ployed in the DEC and INTP operations in more deteil. 
The interlace-to-progressive interpolation operation 
used In DEC and INTP can be quite different than that 
for interlace-to-interlace because, while the INTP op- 
eratton is specified by the MPEG-2 stendard, the 
DEC operation Is outside of the stendard and can 
thus be more complicated. 

FIGS. 8 and 9 show the interiace-to-progressive 
interpolation operation included in the MPEG-2 sten- 
dard. FIG. 8 shows interlace-to-progressive interpo- 
latton of fieldi (i.e., the odd-field), of an Interlaced 
frame in a sequence of frames. Fig. 9 shows the In- 
terlace-to-progresstve interpolation of field2 ( i.e, the 
even-field) of an Interlaced frame in a sequence of 
frames. 

FIG. 8 shows the details of interiace-to-progres- 
slve interpolation operation used In interlace-to-inter- 
lace fonm of spatial scalability. In FIG. 8, lines A, C, 
E, G ... belong to the odd-field and lines B', D\ 
FMH' ... are generated by application of a filter. 

FIG. 9 shows the principles behind the weighted 
spatio-temporal prediction for the progresslve-to- pro- 
gressive and interiace-to-progressive forms of spatial 
scalability in accordance with an aspect of the inven- 
tion. In FIG. 9, lines B, D, F, H ... belong to the even- 
field and lines A', C, E', H' ... are generated by the 
application of interlace-to-progressive interpolation 
filter. The output of the interlace-to-progressive inter- 
polation filter is composed of two contributions, one 
from the field being deinteriaced, and the other from 
an opposite parity field within the same frame. Both 
contributions are generated by applying weighting 
factora to samples of neighboring lines centered at 
the deinteriaced line to be generated, as shown by 
the arrows in FIGS. 8 and 9. This filtering operation 
thus mainteins a compromise between reteining the 
vertical and the temporal resolution. 

An example of interlace-to-interlace scalability is 
a scalable coding scheme where base-layer encoder 
1140 (FIG. 1) encodes CCIR-601 4:2:0 resolution in- 
terlaced frames and enhancement-layer encoder 
1180 (FIG. 1) encodes HDTV resolution interlaced 
frames. Such interiace-to-lnteriace spatial scalability 
can be advantageously used to achieve digital inter- 
laced-HDTV compatibility with standard digital TV. 

FIG. 1 0 shows the principles behind the weighted 
spatio-temporal predlctton for the progressive-to-ln- 
terlace and interlace-to-interlace fonms of spatial 
scalability in accordance with an aspect of the inven- 
tion. For progressive format video signals, FIG. 10 



11 



EP0644 695A2 



12 



shows that the operation of the spatio-temporal 
weighted prediction appiied to the spatiai prediction is 
obtained by upsampling the locally-decoded picture 
from the base-layer and combining it with the motion- 
compensated temporal prediction in the enhance- 5 
ment-layer. The weight code W simply represents 
weights w, where w is the spatial weight applied to all 
the lines of each block. The spatio-temporal weighted 
prediction is obtained by weighting the spatial-predic- 
tion block by w and adding to it the temporal prediction io 
block weighted by a fector of 1-w. 

Tables 1 and 2 show spatial weights, for purposes 
of this example only and not as a limitation on the in- 
vention, for progressive-to-progresslve and interiace- 
to-progrsssive scalability. Besides the listed weights, 15 
purely temporal prediction (i.e., a spatial weight of 0) 
may be used in each case. 

Table 1 , below, shows an example set of 2 weight 
codes for progressive-to-progresslve and Interlace- 
to-progressive scalability. 20 



w1 


w2 


1 


0 


1/2 


0 


1 


1/2 


1/2 


1/2 



Table 4, below, shows an example set of 4 weight 
codes for progressive-to-lnterlace scalability, where 
the base-layer encodes even-fields. 



Wl 


w2 


0 


1 


0 


1/2 


1/2 


1 


1/2 


1/2 



w 



1/2 



25 



Table 2, below, shows an example set of 4 weight 
codes for progressive-to-progresslve and interlace- 
to-progressive scalability 



30 




35 



Tables 5 and 6 show spatial-weights, for purpos- 
es of this example only and not as a limitation on the 
Invention, for progressive-to-intariace scalability 
where the base-layer encodes odd- and even-fields 
respectively. Besides the listed weights, purely tem- 
poral prediction (i.e., a spatial-weight of (0,0)) may be 
used. 

Table 5, below, shows a set of 4 weight codes fbr 
progresslve-to-interiace scalability where the base- 
layer encodes odd-fields. 



Wl 



w2 



3/4 
1/2 



1/4 



Tables 3 and 4 show spatial-weights, for purpos- 
es of this example only and not as a limitation on the 
invention, for progressive-to-lnterlace scalability 
where the base-layer encodes odd-and even-fieids 
respectively. Besides the listed weights, purely tenf>- 
poral prediction (I.e., a spatial weight of (0,0)) may 
also be used. 

Table 3, below, shows an example set of 4 weight 
codes for progressive-to-lnterlace scalability, where 
the base-layer encodes odd-fields. 



40 



45 



so 



65 



1 


1/4 


3/4 


0 


3/4 


1/2 


1/2 


1/4 



Table 6, below, shows a set of 4 weight codes, for 
purposes of this example only and not as a limitation 
on the invention, for progresslve-to-interiace scalabil- 
ity, where the base-layer encodes even-fields. 
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1/4 


3/4 


0 


3/4 


1/2 


1/2 


1/4 
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Tables 7 and 8 show spatial-weights, for purpos- 
es of this example only and not as a limitation on the 
invention, for Interlace-to-interlace scalability. Be- 
sides the listed weights, a purely temporal prediction 
(i.e., a spatial-weight of (0,0)) may be used. 5 

Table 7, below, shows an example set of 2 weight 
codes for interlace-to-lnterlace scalability. 



w1 


w2 


1/4 


1 


0 


3/4 


1/2 


3/4 


1/4 


1/2 
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Table 8, below, shows an example set of 4 weight 
codes for interlace-to-interlace scalability. 



Wl 


w2 


1 


1 


1/2 


1/2 



20 



FIG. 12 shows a diagram of the base-layer en- 
coder and an enhancement-layer encoder used in the 
illustrative embodiment of FIG. 1. A high resolution 
video signal enters on input line 12100. Spatial Dec- 
Imator 12110 reduces the number of pels per frame, 
as described earlier when referring to FIG. 1 , and out- 
puts a base-layer signal on line 12120 to base encod- 
er 12201. Base-layer encoder 12201 uses the well 
known MPEG-1 picture arrangement In which, for 
generality, codes 1, B, and P pictures. Frame reorgan- 
izer 12130 reorders the input frames in preparation 
for coding in the manner well known in the art, and 
outputs the result on lines 12140 and 12150. 

Motion estimator 12170 examines the input 
frame on line 12150 and compares it with one or two 
previously coded frames. If the input frame is of type 
I or P then one previous frame is used. If it is type B 
then two previously coded frames are used. Motion 
estimator 12170 outputs motion vectors on line 
12175 for use by motton compensator 12180 and on 
line 121 85 for use by variable and fixed length encod- 
er 12310. Motion compensator 12180 utilizes motion 
vectors mv and pels from prevtously coded frames to 
compute (for P and B type frames) a nrwtion compen- 
sated prediction that is output on line 12230 and pass- 
es to lines 12240 and 12250. For I type frames, mo- 
tion compensator 12180 outputs zero pel values. * 

Subtracter 12160 computes the difference be- 
tween the input frame on line 12140 and (for P and B 
types) the prediction frame on line 12250. The result 
appears on line 1221 5, is transformed by transformer 
12270 and quantized by quantizer 12290 Into typical- 



ly integer values. Quantized transfonm coefficients 
pass on line 12300 to variable and fixed length encod- 
er 12310 and on line 12305 to inverse quantizer 
12380. 

Inverse quantizer 12380 converts the quantized 
transform coefficients back to full range and passes 
the result via line 12390 to inverse discrete cosine 
transformer 12400, which outputs pel prediction error 
values on line 12410. Adder 12420 adds the predic- 
tion error values on line 12410 to the predk:tk>n val- 
ues on line 12240 to fbnm the coded base layer pels 
on lines 12430 and 12440. 

For I and P type frames, switch 12225 passes the 
coded pels from line 12430 to the next-picture store 
12200 via tine 12205. Simultaneously, the frame that 
was in next-picture store 12208 passes via line 12195 
to previous-picture store 12190. For B type frames, 
switch 1 2225 takes no action , and the contents of pio- 
ture stores 12190 and 12200 remain unchanged. The 
contents of picture stores 12190 and 12200 pass to 
motion estimator 12170 and motion compensator 
12180 via lines 12210 and 12220 for use as needed. 

The quantizer step size QS that is used by quan- 
tizer 12290 and inverse quantizer 12380 is computed 
adaptively by quantization adapter 12360 depending 
on the buffer fullness indication on line 12350. Quan- 
tizer step size QS passes via line 12370 to quantizer 
12290 and via line 12375 to inverse quantizer 12380. 
Quantizer step size QS also passes to variable and 
fixed length encoder 12310 via line 12365. Variable 
and fixed length encoder 12310 encodes quantized 
transform coefficients input on line 12300, nK>tion 
vectora mv input on line 12185 and quantizer step 
stee QS Input on line 12365 into a bit-stream that is 
output on line 1 2320 into a buffer 1 2330 for temporary 
storage until it passes via line 12340 to systems mul- 
tiplexer 12345. The coded base layer frames pass via 
line 12440 to interpolator 12450, as described above, 
where they are upsampled and passed to the en- 
hancement-layer encoder 12201 via line 12460. 

In enhancement-layer encoder 12202, frame or- 
ganizer 12470 reordere the high resolution video 
frames to match the order of the base-layer and out- 
puts reordered frames on line 12480. Subtracter 
12490 computes the difference between the input 
picture on line 12480 that is to be coded and the spa- 
tio-temporal prediction picture on line 12460. The pre- 
diction error is output on line 12500, transformed by 
transformer 12510, quantized by quantizer 12530 
and passed via line 12540 to variable and fixed length 
encoder 12550. Quantizer step size QS^ used by en- 
hancement-layer encoder 12201 is computed by 
quantization adapter 12600 depending on the an in- 
dication of the fullness of buffer 12570 received on 
line 12590. Quantizer step size QSe passes via line 
12605 to quantizer 12530, via line 12610 to inverse 
quantizer 12740 and on line 12615 to variable and 
fixed length encoder 12550. Motion Estimator 12640 
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examines the enhancement-layer input frame on line 
12485, and depending on the picture type being cod- 
ed, compares it wfth either the previous decoded en- 
hancement-layerframe on line 12630 or with two pre- 
vious decoded enhancement-layer frames on lines 8 
12830 and 12680. Motion Estimator 12840 outputs 
motion vectors rnvg on line 12650 for use by motion 
compensator 12655 and on line 12645 for use by va- 
riable and fixed length encoder 12550. Motion Con> 
pensator 12655 utilizes motion vectors mve to com- io 
puts a motion compensated temporal prediction that 
is output on line 12700 and passes to weighter 12710. 

The conespondlng spatially interpolated base- 
layer decoded frame is available on line 12460 and Is 
input to weighter 12710 on line 12890. The spatial 15 
prediction frame at the output of the base-layer inter- 
polator 12450 Is also applied to an input line 12665 of 
spatio-temporal weighting analyzer 12685. The tem- 
poral prediction frame at the output of the motion 
compensator 12855 Is also applied to another input 20 
line 12675 of spatiotemporal weighting analyzer 
12685. The Input frame from frame organizer 12470 
is fed to the third input line 12705 of spatio-temporal 
weighting analyzer 12885. Spatio-temporal weighing 
analyzer 12685 first selects a weighting table de- 25 
pending on the type of spatial scalability, and next 
computes an index to this prestored table indicating 
the best weight or set of weights to be used. The op- 
eration can be done once or more per picture. Typi- 
cally, when employing an MPEG-lilce coding environ- 30 
ment, spatio-temporal weights are adapted on a mac- 
roblodc by macroblock basis. The index to selected 
weights for a macroblock appears on line 12695 and 
is fed to weighter 12710. This index also appears on 
line 12725 and is encoded as part of the bit-stream in 36 
variable and fixed length encoder 12550. Weighter 
12710 computes a weighted average of the two pre- 
dictions input on lines 12690 and 12700 and outputs 
the result on lines 12720 and 12730 to subtractor 
1 2490 and adder 1 2780, respectively. 40 

The locally decoded enhancement-layer video, 
which Is needed for motion compensation of the next 
enhancement-layer frame, is calculated in the same 
way as In the base-layer except for a few differences. 
Specifically, the quantized transform coefficients are 45 
converted to full range by inverse quantizer 12740, 
converted to prediction error pel values by inverse 
transform 12760. added to the motion compensated 
prediction by adder 12780 and passed to the next- 
frame store 1 2620 whose contents can be simultane- so 
ously copied to the previous-frame store 12660. if the 
next frame is a P picture, contents of previous-frame 
store are needed, if it is a B picture contents of both 
frame stores are needed for motion estimation. Vari- 
able encoder 12550 encodes quantized transform 55 
coefficients input on line 12540, quantizer step sizes 
QSe input on line 12615, motion vectors mvs on line 
12645 and index of weights on line 12725 into a bit- 



stream that is output on line 12560. This bit-stream on 
line 12560 then passes to buffer 12570 for temporary 
storage until it passes via line 12580 to systems mul- 
tiplexer 12345. 

For purposes of this example a simple encoder is 
used to illustrate the base and enhancement-layer 
encoders described above. However, it may desirable 
that the base-layer encoder be an MPEG-1 or H.261 
encoder or an MPEG-2 main-profile encoder. The en- 
hancement-layer encoder is assumed to be an 
MPEG-2 next-profile spatial scalability encoder 
which is similar to an MPEG-2 main-profile encoder 
except for weighter 12710 and spatio-temporal ana- 
lyzer 12685, which are discussed In detail below. 

FIG. 13 shows a two-layer decoder consisting of 
base-layer decoder 13001 and enhancement-layer 
decoder 1 3002 corresponding to the coding system of 
FIG. 1. Base-layer decoder 13001 uses the well 
Icnown MPEG-1 picture coding arrangement, which 
for generality, consists of I, B, and P pictures. The re- 
ceived bit-stream on line 13340 passes from the sys- 
tems demultiplexer to buffer 13330 for temporary 
storage until it passes via line 13320 to the variable 
and fixed length decoder 13310. The variable and 
fixed length decoder 1 3310 decodes quantized trans- 
form coefficients and outputs them on line 13300. 
Quantizer step size QSq is outputs on line 1 3370. Mo- 
tion vectors mvo are output on lines 1 3360 and 1 31 75. 
Motion compensator 13180 utilizes motion vectors 
mvo on line 13175 and pels from previously decoded 
frames on line 1 321 0 for P pictures and previously de- 
coded franras on lines 13210 and 13220 for B pic- 
tures to compute motion compensated prediction that 
is output on line 13240. For I type frames, motion 
compensator 13180 outputs zero pel values. 

Quantizer step QSq passes from variable and 
fixed length decoder 13310 via line 13370 to inverse 
quantizer 13380. Quantized transform coefficients 
pass on line 13300 to inverse quantizer 13380. In- 
verse quantizer 13380 converts the quantized trans- 
form coefficients baci^tofull range and passes the re- 
sult via line 13390 to inverse transformer 13400, 
which outputs pel prediction error values on line 
1 3410. Adder 13420 adds the prediction enror values 
on line 13410 to the prediction values on line 13240 
to form the decoded base-layer pels on lines 13430, 
13140 and 13440. For I and P type frames, switch 
13435 passes the decoded pels input on line 13430 
via line 13205 to the next-picture store 13200. Simul- 
taneously, the frame In next-picture store 13200 
passes via line 13195 to previous-picture store 
13190. For B type frames, switch 13435 talces no ac- 
tion, and the contents of picture stores 13190 and 
13200 remain unchanged. The contents of picture 
stores 13190 and 13200 pass to motion estimator 
13170 and motion compensator 13180 via lines 
13210 and 13220 for use as needed as described 
herein. Frame organizer 13130 reorders the base- 
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layer decoded output frames on line 13140 In prapa* 
ration for display on line 13125 in the manner well 
known in the art The decoded base-layer frames 
pass via line 13440 to interpolator 13450, as descri- 
bed above, where they are upsampied and passed to 5 
enhancement-layer decoder 13002 via line 13460. 

The enhancement-layer bit-stream passes from 
systems demultiplexer 13005 to buffer 13570 via line 
13580 for temporary storage until it passes via line 
13560 to variable and fbced length decoder 13550. io 
Variable and fbced length decoder 13550 decodes 
quantized tranafonn coefficients and outputs them 
on line 13540, quantizer step size QSqe are output on 
line 13610, motion vectors mvoE are output on lines 
1 3645 and 1 3650 and an index of weights are output i5 
on lines 13725 and 13695. Quantizer step size QSqe 
passes from line 13610 to inverse quantizer 13740. 
The inverse quantizer 13740 converts the quantized 
transfonm coefficients on line 13540 back to full 
range and passes the result via line 13750 to inverse 20 
transform 13760, which outputs pel prediction error 
values on line 13770. 

Motion compensator 13655 utilizes enhance- 
ment-layer motion vectors mvoE on line 13650 and 
pels from the previously decoded enhancement layer 25 
frames on lines 13630 and 13680 to compute a mo- 
tion compensated prediction that is output on line 
13700 and passes to weighter 13710. The decoded 
base-layer frame is upsampied in interpolator 13450 
and applied via line 13460 to the other input to the 30 
weighter 13710. Weighter 13710 computes a weight- 
ed average of the two predicttons input on lines 1 3460 
and 1 3700 and outputs the result on line 1 3720 to ad- 
der 13780. The weighting used in computing the pre- 
diction is the same as that used during the encoding 35 
process. The weights are obtained in weighter 13710 
by using index of weights available at line 13695 to 
look up values from a table. The output of adder 
13780 is available on lines 13480 and 13790, decod- 
ed frame on line 13790. If not a B picture, the output 40 
is passed through switch 13810 to line 13815 and 
stored In the next-picture store 13620 and its con- 
tents are simultaneously copied to previous-picture 
store 13660. The contents of picture stores 13660 
and 1 3620 are used for nrK>tion-compen8ated predic- 4S 
tlon of subsequent frames. Frame reorganizer 13470 
reorders the high resolution vMeo frames on line 
13480 to match the order of the base-layer and out- 
puts the result on line 13135 for display. 

For purposes of this example a simple decoder is so 
used to illustrate the base and enhancement-layer 
encoders described above. However, it may be desir- 
able that the base-layer decoder be an MPEG-1 or an 
H.261 decoder or an MPEG-2 main-profile decoder. 
The enhancement-layer decoder is assumed to be an 55 
MPE6-2 next-profile spatial scalability decoder 
which is very much like the MPEG-2 main-profile de- 
coder except for weighter 13710, which is discussed 



in detail below. 

FIG. 14 shows details of the spatio-temporal ana- 
lyzer employed In FIG. 12. Spatiotemporal analyzer 
14000 takes the spatial predtetton signal obtained by 
interpolatton of the base-layer signal on line 14690 
and the enhancement-layer temporal prediction sig- 
nal on line 14700. The signal on line 14690 also ap- 
peare on line 14880 and the signal on line 14700 also 
appeara on line 14890 and thus form two inputs to 
spatio-temporal weighter 14920. The third input is 
weight(s) WO on line 14910. The firat entry in the 
weight look-up table 14870 having contents that are 
available on line 14905. This process Is repeated for 
spatio-temporal weighter 14921 which takes signal 
on lines 14690 and 14700 at respective input lines 
14881 and 14891, with weight W1 on line 14911, the 
next entry in the weight look-up table 14870 whose 
contents are available on line 14905. This process is 
similarly repeated for all spatio-temporal weighters, 
where the number of weightere depends on number 
of entries available to choose from in the weight ta- 
bles. The spatk>-temporal weighted prediction image- 
blocks are now available on lines 14930, 14931,... 
14934 and are differenced In differencere 14950, 
14951, ... 14954 from original innage-blocks available 
on lines 14940, 14941, ... 14944, the resulting predic- 
tion error blocks are available on lines 14960, 
14961 14964 and form an input for computation of 
a single distortion measure per image-block per spa- 
tio-temporal predictk>n. Each distortion measure 
computed in 14970, 14971, ..14974 is either sum of 
squares or sum of absolute values and is output on 
tines 14980, 14981.., 14984 and forms inputs to the 
comparator 14990, which compares all the values 
and outputs an Index on line 14695 corresponding to 
the smallest value per Image-block. If MPEG-1 type 
coding is employed, an image-block Is equivalent to 
a macroblock. In different contexts, an Image-block 
may be as large as an entire frame or field. 

Weight look-up table 14870 contains separate 
weight tables for the various Illustrative fonms of spa- 
tial scalability (i.e. progressive-to-progresslve, inter- 
lace-to-interlace, interlace-to-progressive and Inter- 
lace-to-interlace) and, depending on the form of spa- 
tial scalability chosen, a corresponding weight table 
is selected. 

FIG. 15 shows the details of the spatfo-temporal 
weighter employed in FIGS. 12, 13 and 14. Spatio- 
temporal weighter 15000 takes spatial prediction sig- 
nal obtained by interpolation of the base-layer signal 
on line 15690 and the enhancement-layer temporal 
prediction signal on line 15700. An index input on line 
1 5695 is applied to form an address to the weight ta- 
ble 15870. and the resulting welght(s) on line 15825 
are applied to a multiplier 15820 having a spatial pre- 
diction signal on line 15690 as its other input Also, 
these weight(s) are applied on line 15845 forming an 
Input to the differencer 15865. The other input is 
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forced to value of '1/ and the resulting complemen- 
tary weight(8) signal appears on the output line 
15835, which in turn foms an input to the multiplier 
15830. The temporal prediction signal on line 15700 
forms the other input to the multiplier 15830. The out- 5 
put of multipliers 15820 and 15830 are appear on 
lines 15840 and 15850 and form an input to a adder 
15860. The output of adder 15860 Is the spatlo-tanv 
porai weighted prediction signal and appears on line 
15875. 10 

The above-described invention provides a tech- 
nique for deriving video images of varying resolutions 
from a single video source. It will be understood that 
the particular methods described are only illustrative 
of the principles of the present invention, and that va- 15 
rious modifications could be made by those sicilled in 
the art without departing from the spirit and scope of 
the present invention, which is limited only by the 
claims that follow. 



Claims 

1. A method of encoding a video signal comprising 

the steps of: 25 

receiving a digital video signal including a 
succession of digital representations related to 
picture elements of a video image where said dig- 
ital video signal has a characteristic resolution; 

producing a first encoded version of said 30 
received digital video signal having a resolution 
less than or equal to said characteristic resolu- 
tion; 

producing a second encoded version of 
said received digital video signal having a resolu- 36 
tion equal to said characteristic resolution; 

producing a first prediction of said video 
image from said first encoded version of said re- 
ceived digital video signal; 

producing a second prediction of said vid- 40 
so image from said second encoded version of 
said received digital video signal; 

combining said first prediction and said 
second prediction to produce a combined predic- 
tion; and 45 

employing said combined prediction to en- 
code said second encoded version of said re- 
ceived digital video signal; 

2. The method of dalm 1 in which said step of pro- so 
ducing said firet prediction includes producing a 
spatial prediction. 

3. The method of daim 2 in which said step of pro- 
ducing said second prediction includes producing ss 
a temporal prediction. 

4. The method of daim 3 in which said step of com- 



bining indudes weighting said spatial prediction 
and weighting said temporal prediction to pro- 
duce said combined prediction. 

5. The method of daim 1 in which said step of en^ 
coding a first encoded version indudes encoding 
an interlaced version of said received digital vid- 
eo signal, or encoding a progressive verelon of 
said received digital video signal. 

6. The method of daim 5 in which said step of en- 
coding a second encoded version indudes en- 
coding an interlaced verelon of said received dig- 
ital signal, or encoding a progressive verelon of 
said received digital signal. 

7. The method of daim 1 in which said step of en- 
coding a firet encoded version indudes encoding 
using MPEG-1 coding standards, or encoding us- 
ing H.261 coding standards, or encoding using 
MPEG-2 coding standards. 

8. The method of daim 7 in which said step of en- 
coding a second encoded version Indudes en- 
coding using MPEG-2 encoding standards. 

9. A method of decoding said first encoded verelon 
and said second encoded version of daim 1 for 
producing an unencoded video signal having at 
least one of a plurality of predetermined charac- 
teristics. 

10. The method of daim 9 wherein said at least one 
of a plurality of predetermined characteristics is 
resolution scale, or said at least one of a plurality 
of predetermined characteristics is picture for- 
mat. 

11. An apparatus for encoding a video signal com- 
prising: 

a receiver for receiving a digital video sig- 
nal induding a succession of digital representa- 
tions related to picture elements of a video image 
where said digital video signal has a characteris- 
tic resolution; 

a means for producing a first encoded ver- 
sion of said received digital video signal having a 
resolution less than or equal to said characteristic 
resolution; 

a means for producing a second encoded 
version of said received digital video signal hav- 
ing a resolution equal to said characteristic reso- 
lution; 

a means for producing a first prediction of 
said video Image form said firet encoded verelon 
of said received digital video signal; 

a means for producing a second prediction 
of said video image forms said second encoded 
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version of said received digital video signal; 

a means for confibining said first prediction 
and said second prediction to produce a confi- 
bined prediction; and 

a means for employing said combined pre- 5 
diction to encode said second encoded version of 
said received digital video signal. 

12. The apparatus of daim 11 in which said means 

for producing said first prediction includes a io 
means for producing a spatial prediction. 

13. The apparatus of daim 12 in which said means 
for producing said second prediction Includes a 
means for produdng a temporal prediction. 15 

14. The apparatus of daim 13 In which said means 
for combining includes a means for weighting 
said spatial prediction and a means for weighting 

said temporal prediction to produce said com- 20 
bined prediction. 

15. The apparatus of daim 11 in which said means 
for encoding a first encoded version Includes a 
means for encoding an interlaced version of said 25 
received digital video signal. 

16. The apparatus of daim 11 In which said means 
for encoding a first encoded version indudes a 
means for encoding a progressive version of said 30 
received digital vkJeo signal. 

17. The apparatus of daim 15 or 16 in which said 
means for encoding a second encoded vereion 
Indudes a means for encoding an Interlaced ver- 3S 
sion of said received digital signal. 

18. The apparatus of daim 15 or 16 In which said 
means for encoding a second encoded vereion 
Indudes a means for encoding a progressive ver- 40 
sion of said received digital signal. 

19. The apparatus of daim 11 in which said means 
for encoding a first encoded version indudes a 
means for encoding using MPEG-1 coding stan- 45 
dards, or a means for encoding using H.261 cod- 
ing standards, or a means for encoding using 
MPEG-2 coding standards. 

20. The apparatus of daim 19 In which said means so 
for encoding a second encoded version includes 

a means for encoding using MPEQ-2 encoding 
standards. 
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@ Spatially scalable video encoding and decoding. 



in 
to 



) Video images of varying resolutions are de- 
rived from one video signal (VIDIN) with high 
bandwidth efficiency by employing a new two- 
layer video coding technique using spatial 
scalability In which the prediction taken from 
one layer (1140) Is combined witti the prediction 
taken from the otiier layer (1180), and the com- 
bined predictton is used to code one of ttie 
layers (in 1180). In an illustrative example of the 
invention employing a base-layer and an enhan- 
cement-layer, tite spatially interpolated base- 
layer (on 1170) is combined, by tiie selection of 
appropriate weights, with tiie motion compen- 
sated temporal prediction of the enhancement 
layer to generate tiie prediction used to encode 
tiie enhancement-layer (in 1180). Weights are 
selected based on a calculation of the sum of 
tiie absolute differences, or the sum of the 
squares of tiie differences between the predic- 
tion and rsference macroblocks to produce the 
prediction giving the best bandwMth efFiciency. 
This weighting process is called spatio-tem- 
poral weighting. 
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